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Structural quantities such as order parameters and correlation functions are often employed to 
gain insight into the physical behavior and properties of condensed matter systems. While standard 
quantities for characterizing structure exist, often they are insufficient for treating problems in 
the emerging field of nano and microscale self-assembly, where the structures encountered may be 
complex and unusual. The computer science field of "shape matching" offers a robust solution to 
this problem by defining diverse methods for quantifying the similarity between arbitrarily complex 
shapes. Most order parameters and correlation functions used in condensed matter apply a specific 
measure of structural similarity within the context of a broader scheme. By substituting shape 
matching quantities for traditional quantities, we retain the essence of the broader scheme, but 
extend its applicability to more complex structures. Here we review some standard shape matching 
techniques and discuss how they might be used to create highly flexible structural metrics for 
diverse systems such as self-assembled matter. We provide three proof-of-concept example problems 
applying shape matching methods to identifying local and global structures, and tracking structural 
transitions in complex assembled systems. The shape matching methods reviewed here are applicable 
to a wide range of condensed matter systems, both simulated and experimental, provided particle 
positions are known or can be accurately imaged. 

I. INTRODUCTION 

The preponderance of new nanometer- and micron-sized colloidal particles of nearly arbitrary shape, composition 
and interaction has made possible the self-assembly of exquisitely complex structures with potential uses in a variety 
of technologies [IHl]- Because material properties and behavior are determined by both the global and local shapes, or 
patterns, within the self-assembled structure [U [SHI], methods and tools are needed to characterize the salient structural 
features of the assemblies. The field of condensed matter physics has traditionally led the way in developing algorithms 
for characterizing crystal structures and constructing theories to connect these structures to thermodynamics and to 
overall system properties |10lfT^ . These approaches typically involve constructing structural order parameters and/or 
correlation functions that can discriminate between different building block arrangements and are well developed for 
systems of point-like, rod-like and spherical particles |13H17j . Examples include nematic and smectic order parameters 
for systems of rods citenematic, smectic, liquidcrystals and bond order parameters |141 [TSl [T51 [TO] for 2d and 3d systems 
of spheres. 

However, these functions fail, in many cases, to fully describe the structural complexity of assemblies of more unusual 
nanocoUoids, including those formed from spherical particles [71 [20], rod-Hke particles[Sl [21], polyhedral particles [25]- 
[55], colloidal molecules [51 15DH5I] . patchy spheres [551110] . arbitrarily-shaped objects[Tl[3], polymer-tethered nanoparti- 
cles [H [53 SIHIS] , and terminal assemblies resembling biological structures [311 HZ]- For example, it is easy to envision 
that order parameters defined for spherical or rod shaped particles may fail when applied to more complex shaped 
particles, such as "Y" particles or triangular plates [3]. As a result of the increased complexity of nano building 
blocks, there are few "model problems" in nano and microscale self-assembly for which generally applicable order 
parameters can be defined. The dearth of structural metrics has lead many recent experimental and computational 
studies of assembled systems to rely heavily on visual inspection or ad hoc analysis for characterizing structures, 
rather than well established schemes. This approach is not optimal, since visual inspection can be time consuming 
and typically less accurate than mathematical analysis, and ad hoc analysis can be idiosyncratic, making it difficult to 
compare structures across independent studies. The impetus for new structural metrics is also driven by advances in 
microscopy techniques that allow for the direct imaging of nano and microscale systems, which have greatly extended 
the range of systems for which detailed structural analysis can potentially be performed. For example, the tracking 
of micron-sized colloidal particles in 2d and 3d is now routine [151155] . and high-fidelity imaging of nanoparticles[55] 
and their assemblies 23, .54, ^ is steadily improving. Combined with proper image processing techniques, one can 
extract much information about structure, such as the particle positions [481 152j and other key features, providing 
detailed structural information on par with simulations. Assuming one can construct order parameters sensitive to 
these unique building blocks and their assemblies, similar routines can be applied to both experimental and simulated 
systems, allowing for direct comparison [23l [33]. 

Analysis techniques from the computer science field of "shape matching" offer a potentially powerful solution to the 
problem of creating general structural metrics for these systems. Shape matching involves defining general structural 
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metrics that can be used to measure the degree of similarity between diverse shapes. Such similarity measures can 
be applied within the context of traditional condensed matter order parameter and correlation function schemes 
to obtain analogous quantities for more complex structures. This is possible because, in practice, most standard 
structural characterization schemes include an implicit concept of matching or shape similarity; that is, the schemes 
typically measure the degree to which a structure of interest matches another (often ideal) structure. As a familiar 
example, consider the standard nematic order parameter which gives an optimal value of 1 when the rod-like particles 
within the system are perfectly aligned, and if the rods have random orientations. In this case, the order parameter 
measures the degree to which the local arrangement of rods in the system, described mathematically by the angles 
between neighboring rods, matches with an ideal reference system with perfect alignment (see Fig.jl]). Other structural 
characterization schemes and spatial or temporal correlation functions involve similar underlying concepts of matching. 
As we will discuss, by modifying these schemes to use shape matching methods, we retain their overall physical insight, 
but gain the ability to apply them to complex structures. Although we focus exclusively on simulated assembled 
systems here, these types of methods are general enough that they can be applied to particle systems in general, 
provided that the particle positions and or orientations can be determined or imaged. Examples of systems, both 
experimental and simulated, to which shape matching methods can potentially be applied include but are not limited to 
nanoparticle superlattices created from mixtures of spherical and/or non-spherical nanoparticles jS^ 155) . microphase 
separated systems, such as tethered nanoparticles and block copolymers that form crystalline and quasicrystalline 
domains [56t I57j. colloidal ionic crystals [58], dense colloids |49I and granular matter [59l l60]. 




query structure reference structure shape descriptor similarity metric 



FIG. 1: Example of an implicit shape matching scheme within the context of a standard order parameter. The panel depicts 
the process of computing the nematic order parameter P2 for a system of rod-like colloidal ellipsoids that assemble into an 
aligned ordered phasejGl]. In the language of a shape matching scheme (see section |ll]|, the colloidal system acts as a "query 
structure" that we wish to characterize. An ideal system for which the rods are all oriented along the average global director 
acts as an implicit "reference structure." The local values of the angles 9 between rods in the query structure and reference 
structure act as "shape descriptors." The Legendre polynomial P2 acts as a "similarity metric." The global nematic order 
parameter P2 is computed by averaging over local values of P2[cos(6')]. 



This review is organized as follows. In section |TTj we review shape matching methods from the literature, restricting 
our scope to methods that we believe are most immediately applicable to assembled systems. We describe how 
representative shapes can be extracted from particle systems, review the shape descriptors that are best suited to 



describe these shapes numerically, and show how they can be compared quantitatively. In section III we apply a 



prototype shape matching scheme to three representative example problems from simulations of self-assembly. Our 
examples include identifying global structures in a microphase-separating system of polymer-tethered nanospheres|56| . 
detecting local icosahedral clusters in a fluid of hard tetrahedral particles [52], and tracking the twisting of a helical 
sheet formed from polymer-tethered nanorods|62|. In section IV we suggest new applications for shape matching 
methods, including constructing correlation functions, measuring local crystal grains and crystal defects, devising 
guided computer algorithms to map parameter spaces and search for target structures, and grouping and classifying 
structures based on particular structural features. To aid in the development and dissemination of new structural 
analysis methods based on shape matching techniques, we provide accompanying software and examples via the 
web 1631. 



II. SHAPE MATCHING 



Quantifying how well structures match has been generalized in the context of shape matching[Bl] (see Fig. [2|. 
Familiar applications include matching fingerprints and signatures[64j. facial recognition [BSl and medical imaging (6oj. 
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Shape matching defines the concept of the shape descriptor, a numerical "fingerprint" that describes a pattern or 
shape. Shape descriptors are associated with query structures and compared with reference structures. The degree of 
matching between query and reference structures is quantified by a similarity metric. 
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FIG. 2: Data flow diagram for shape matching, (a) A structural pattern is extracted for a given query structure and then 
indexed into a shape descriptor, which represents a numerical fingerprint for the structure. (6) The shape descriptor is then 
compared with shape descriptors for reference structures to give a measure of similarity between shapes. Depending on how 
we choose the query and reference structures, the similarity value obtained may be applied to constructing order parameters, 
correlation functions, or other applications. 



Matching information can be used to create order parameters and correlation functions, identify structures, and 
perform many other types of structural analysis. Since we can choose virtually any structure as a reference for 
comparison, shape matching facilitates the creation of highly specific structural metrics. In the following sections, we 
review the process of constructing a customized structural metric which involves choosing interesting structures to 
characterize, computing shape descriptors, and using similarity metrics to compare them. 



A. Representative Structural Patterns 



Before we can compute a shape descriptor, we must extract a representative structural pattern from the system. 
This step relies largely on physical intuition; often redundant or unimportant structural information can be discarded 
out-of-hand to ensure that the matching scheme is only sensitive to important structural features. One standard type 
of coarse-graining that is often employed, particularly to the case of small clusters of roughly spherical particles, is 
to consider particle positions exclusively, discarding information regarding particle sizes and shapes which may be 
nearly identical (see Fig. [3| . This type of coarse-graining can also be applied to more complex morphologies, such 
as structures assembled from polyhedral building-blocks[12], or hierarchical assemblies such as micellar systems [571- 
IBU] or virus capsids[lSl |37], wherein the building blocks assemble into larger structural sub-units that arrange into 
superstructures. In such cases, the representative structural pattern is given by the positions of assembled sub-units, 
rather than the individual building blocks (detailed in Example 1 in section III below). 

Many complex structures cannot be described by positions alone, and require information regarding building block 
sizes, shapes and orientations. Such structures can be described by "volumetric data," or "voxel data" (i.e., d- 
dimensional pixel data), which is represented numerically by a collection of weights or pixel intensities for cells in a 
grid that spans space. This representation is particularly apt for describing the microphase-separated morphologies 
assembled from systems of tethered nanoparticles and block copolymers, where spatial density maps for the aggregating 
species may resemble sheet-like or network domains [SHHH] (see Fig. |4]). Voxel data captures the essential structural 
features of these systems, whereas a pattern based on the positions of individual particles within the superstructure 
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does not. The same rule applies to many other types of structures for which the bulk shape is more important than the 
underlying particle positions, including all types of phase-separated structures, many complex biological structures 
such as proteins and macromolecules[7^ [75]. and large but finite (aka "terminal") nanoparticle assembhes. Shape 
descriptors are typically sufficiently flexible to use either voxel data or point cloud data as an input. 
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FIG. 3: Extracting global patterns using the superposition method. For structures with long-range orientational ordering, 
such as the diamond structure formed by tetragonally patterned patchy spheres depicted in the panel [37]. a global pattern is 
extracted by translating all local clusters[73] or density maps to a common origin. Here, the local structures are represented by 
particle positions, but more complex representations are possible. A global shape descriptor is then computed for the resulting 
finite structure. 



Shape descriptors are typically constructed to describe finite objects. Thus, when describing global structures 
such as crystals or bulk disordered systems, local shapes must first be extracted from the infinite system and then 
combined into finite local patterns that reflect the "global pattern" for indexing. The types of global patterns that 
we create depends on the structural properties of the system. For structures with long-range orientational ordering, 
such as crystals and quasicrystals[75], the shape and spatial orientation of local clusters within the system are highly 
correlated. Thus, a global structural pattern can be obtained by translating all local shapes to a common origin|19j. 
a scheme that we denote as the "superposition method." The visual depiction of the superimposed structures is 
known as a "bond order diagram [74]," an example of which is depicted for the diamond structure formed by patchy 
particlesfST] in Fig. [sj For crystals with multiple particle types, independent global descriptors can be created 
for each type independently, and a combined descriptor can be created. Global descriptors based on orientational 
ordering are applicable to crystalline structures in general, including phase-separated systems arranged in crystalline 
superstructures [Ml [T^, where the neighbor directions are computed for the centers of the micelles, cylinders, etc. 
rather than the individual particles. Some non-crystalline globally-ordered microphase-separated structures, such as 
layered or network structures, can be described by superposition as well, where global patterns are built up from local 
density maps, rather than from local point clusters. This reflects the fact that the probability density of observing 
particles in particular spatial directions within these morphologies is often non-uniform. 

For systems without long-range orientational ordering such as liquids, glasses and amorphous solids, a different 
strategy must be employed, since, in such cases, the superposition of local structures inherently yields a uniform 
pattern. In such cases, rather than combining neighbor directions or density maps by superposition, we compute a 
probability distribution of local patterns. The probability histograms for different structures can then be compared to 
obtain a measure of similarity between global structures (Fig. [4]). Computing probability distributions is also useful 
for certain complex orientationally-ordered structures, for which the superposition of local density maps becomes 
non-distinguishing due to the presence of many different characteristic directions within the structure. An example 
of such a structure is given by the double gyroid structure composed of tethered nanorods[7I shown in Fig. El 



5 




Liquids, glasses, disordered 1 ) Compute shape 2) Compute probability 

superstructures, complex descriptor for each local distribution of local 

phase separated structures structure separately descriptors 



FIG. 4: Extracting global patterns using the probability distributions method. For structures without long-range orientational 
ordering, or complex global structures with many diflerent characteristic directions, a global pattern can be built up from the 
probability distribution of local patterns. The double gyroid formed from tethered nanorods[7l], which falls into the latter 
category, is characterized by computing the distribution of local nanoparticle density maps sampled throughout the structure. 
The red/blue color scheme emphasizes the bicontinuous nature of the interpenetrating network. 



B. Shape Descriptors 

Once we have extracted a representative structural pattern from our particle system, we can compute a shape 
descriptor to represent the pattern numerically. Depending on the intended application, different shape descriptors 
may be best suited to describe a particular structural pattern, and this information should be considered when deciding 
which shape descriptor to compute. Below is a short list of desirable shape descriptor properties within the context 
of assembled systems: 

• Robustness: the degree of sensitivity to structural defects or random thermal noise. Some shape descriptors 
have an inherent data-smoothing mechanism, whereas others require preprocessing to effectively process thermal 
data. 

• Invariance: the ability for shape descriptors to remain invariant (i.e., unchanged) under certain mathematical 
transformations. Invariance under scaling and translations is typically desirable. Additionally, descriptors may 
be invariant under rotations, mirroring operations, or similarity transformations. Rotation invariance is the 
most important of these properties for particle systems. For descriptors without rotation invariance, we often 
must align or "register [771 [TS]" objects prior to matching. 

• EfRciency: the computational effort required to calculate the descriptor. For certain applications, CPU time 
and memory costs may be a limiting factor for choosing a shape descriptor. For example, efficiency may be an 
important factor for on-the-fly order parameter calculations that occur during a molecular simulation, whereas 
for offline data analysis it may be irrelevant. Often, there is a direct tradeoff between computational cost and 
accuracy. 

• Comparability: the ease of matching. Shape descriptors should yield similar results for similar structures and 
different results for different structures. Shape descriptors should be constructed such that similarity is easy 
to quantify. The numerical similarity should directly reflect the physical similarity between the shapes used to 
construct the descriptors. 

Below, we review some shape descriptors from the computer science shape matching literature. Since shape matching 
is a broad field, we focus on the subset of methods that are best suited for assembled systems. For a general review 
of some relevant shape matching methods, see references [79U81j . 
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FIG. 5: Depiction of six different shape descriptors applied to self-assembled systems, (a) The point matching descriptor 1771 182] . 
Descriptor components are given trivially by particle positions or density maps, (b) The shape histogram descriptor 83]. The 
structure is indexed into a spatial histogram consisting of shells and sectors, (c) Shape distribution descriptors .84'. The 
probability distribution is computed for various local measurements, such as the distance or angle between surface points, (d) 
Harmonic descriptors [85ff88] . The shape histogram is decomposed into a convenient harmonic representation, which can be 
used for rotation-invariant matching, (e) The shape contexts descriptor |89]. A coarse histogram is created for each point on the 
structure. The descriptor is given by the collection of sub-descriptors for each point. (/) The lightfield descriptor [5^. Images 
or projections are constructed from several different vantage points and indexed into individual shape descriptors. The overall 
descriptor is given by the collection of sub-descriptors for each image. 



Point-Matching Descriptor: For relatively simple structures such as small clusters of atoms, molecules, or nanopar- 
ticle/coUoidal building-blocks, we can use the particle positions themselves (or a corresponding density map) as a 
shape descriptor (Fig. [5^). Matching for this scheme is often based on the root-mean-square (RMS) difference be- 
tween points, and thus the scheme itself is sometimes referred to as "RMS matching." Point matching schemes were 
applied to early attempts at shape matching for macromolecules|91j. and more complex variations have since been 
implemented for proteins |92). Point matching schemes have the advantage of being conceptually trivial; however, 
there are many subtleties associated with these schemes that should be considered. First, the descriptor requires an 
assignment step to determine the optimal correspondence between points in compared structures, which is used to re- 
order the coordinates in the shape descriptors accordingly. Also, since the descriptors are sensitive to scale, position, 
and orientation, structures must first be normalized and registered unless the orientations are known beforehand, 
or rotation-dependent matching is desired. Depending on the application, shapes may be registered based on rigid 
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alignment, or other constraints. Since both assignment and registration are computationally expensive (i.e. they scale 
poorly with the number of points, n) point matching descriptors should be avoided unless (1) the number of atoms, 
molecules, or building blocks that make up the structure is small, (2) matching is required for only a few structures, 
or (3) registration is not required. 

Shape Histogram: Another conceptually simple shape descriptor that has been applied to molecular database 
searches is known as the "shape histogram" [53] (Fig. ^jp) . This descriptor is based on a density map of the structure 
on a polar or spherical grid. Shape histograms are best suited for describing structural patterns that can be broken 
down into concentric shells, such as nanoparticle clusters, proteins and macromolecules. Shape histograms are also well 
suited for indexing global patterns created by the superposition method, as outline above, and can index structures 
with orientational ordering such as crystals or quasicrystals, and simple microphase separated structures such as 
layered phases or network structures. The shape histogram has the advantage over the point matching method that 
no assignment step is required, since the ordering of points is lost during binning. Additionally, the grid resolution 
can be adjusted to provide a variable degree of coarse-graining. Like the point matching method, the shape histogram 
requires registration to match non-aligned objects, unless only radial bins are used (i.e., the angular grid resolution 
is set to zero). However, shape histograms may lose their discerning capabilities without an angular component. If 
n is large, the cost of registration can be significantly reduced by aligning the histograms themselves rather than the 
underlying structures. 

Shape Distributions: For many applications, registration is too costly and we require rotation-invariant descriptors. 
A simple, yet powerful method for creating invariants, known as the "shape distributions" scheme [S3] (Fig. [sj:), involves 
computing distribution functions for simple rotationally-invariant local metrics. Such local metrics are defined based 
on object surfaces; thus this method is best applied to structures with clearly defined, yet distinguishable, surfaces, 
such as microphase-separated structures formed by block copolymers[321 HZ] or tethered nanoparticles[31 US] |M1 IS3] 
(see, for example. Fig. |4|. The shape distribution "D2" is defined as the probability distribution of the distance 
between pairs of surface points. Another similar distribution "A3" is defined by the probability distribution of angles 
formed by triples of surface points. Similar distributions are defined for higher numbers of points. The distributions 
D2 and A3 are similar to the radial distribution function g{r) and angular distribution function a{9), respectively, 
although usually only surface particles are considered. Like g{r) and a(9), shape distributions are too coarse to 
distinguish between similar shapes, such as small polyhedral clusters. 

Harmonic / Invariant Moment Descriptors: A more complex, but more powerful method for computing invariant 
descriptors is to compute the harmonic transform of the shape histogram. By disregarding the phase information, 
we obtain descriptors that are invariant under rotations (Fig. [5ji). Like the shape histogram, harmonic descriptors 
are versatile and can be applied to a wide range of structures including complex nanoparticle clusters, proteins and 
macromolecules, and crystalline or microphase-separated structures. The method by which we compute the harmonic 
transform of the shape histogram depends on the underlying basis. Invariants can be obtained for shapes on the 
circle j85j (0-dependence) . sphere^ (6*, ^-dependence), disk^ (r, 0-dependence) and ball[88|(r, 0, (/(-dependence). On 
the unit circle or sphere, the harmonic descriptors are called "Fourier descriptors," whereas on the disk or ball, the 
descriptors are known as "Zernike descriptors." The implementation of these methods for complex assembled systems 
is described in detail elsewhere |94j. Harmonic descriptors exhibit an inherent data smoothing mechanism that leaves 
them better-suited for describing small polygonal or polyhedral clusters than the shape histogram, which is prone 
to error without sufficient averaging. This property, combined with the property of rotational-invariance, makes 
harmonic descriptors ideal for describing orientationally-disordered global structures, such as liquids, glasses and 
certain microphase-separated structures, via the probability distributions method. Harmonic descriptors also contain 
additional frequency-dependent information regarding the symmetries of the structure. These unique properties of 
harmonic descriptors have already been successfully applied to constructing orientational order parameters for small 
clusters and simple crystals [13 [Hj. 

Shape Contexts: It is fairly common in the context of self-assembly experiments and simulations to encounter 
nearly-ideal assembled structures with localized defects. Thus, it is often desirable to distinguish between local 
structural dissimilarities that arise due to defects, and "overall" differences in the structure. A brute-force solution to 
this problem is to explicitly include defective structures in the library of reference structures such that they may be 
identified directly 67J; however, his requires a priori knowledge of the entire space of potential defective structures. 
Obtaining such knowledge may be intractable for complex assemblies with many degrees of structural freedom, or 
unmapped systems whose local motifs have not yet been thoroughly studied. A more general solution is to apply 
a "partial matching" scheme, such as the "shape contexts" method[89 ] I95 j . which is capable of matching structures 
independently of local defects, as well as identifying such defects (Fig. Isfe). The shape contexts method combines 
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elements of the point matching scheme with the shape histogram descriptor. Here, a separate shape histogram is 
computed for each sample point in the structure, where the coordinate system is centered at that point. The points in 
the query structure are then assigned to their corresponding points in the reference structure by optimizing the match 
between shape histograms. Outlier points that don't correspond well (i.e., local defects) can be excluded to obtain 
a partial match, or used to identify the defects. Shape contexts can be applied to any system where local defects 
might arise, such as atomic or molecular clusters, micro or nanoscale assemblies, or biological structures. Since shape 
contexts are based on the shape histogram, they have the same limitations when indexing structures with a small 
number of sample points locally. 

Lightfield Descriptor: The shape contexts descriptor is just one example of the more general method for creating 
new powerful descriptors by combining simpler sub-descriptors . A similar method based on combining sub-descriptors 
is given by the light-field descriptor [5D], which involves projecting 3D structures onto 2D images from 20 vantage points 
at the vertices of a dodecahedron. This process effectively simulates the act of viewing a structure from different 
angles by eye, giving the lightfield descriptor its name (Fig. [s]:). The lightfield descriptor can thus be applied to 
microphase-separated structures, nano/coUoidal scale assemblies, or other structures that can be effectively identified 
by the trained eye. Each of the 20 2d images is indexed by a 2d descriptor, and assignment is performed for pairs 
of these descriptors for compared structures to optimize correspondence. In practice, many initial rotations of the 
dodecahedron are attempted to minimize error due to small offsets in the spatial orientation. 

Other Possible Descriptors: In addition to the shape descriptors outlined above, the shape matching literature 
defines numerous potentially useful descriptors that we have not mentioned here. Some intriguing possibilities include 
graph based descriptors [96-98) . descriptors based on reflective symmetries [99 , and methods based on the similarity 
of slices of objects[TDD]. Several structural metrics from the condensed matter literature might also serve as useful 
shape descriptors for some applications. For example, in the realm of global structures, diffraction patterns, radial 
distribution functions, or orientation tensors (e.g. the radius of gyration tensor or the nematic order tensor [TUT] ) 
could be indexed into shape descriptors. For local structures, analysis schemes such as the common neighbor analysis 
scheme of reference|16j could be easily incorporated. Although many of the structural metrics from the literature may 
not be independently distinguishing for a wide range of problems, they may still yield useful information as part of a 
more general scheme through a combination of descriptors. 

C. Similarity Metrics 

The degree to which two shape descriptors match [T02] is quantified by a similarity metric. Computing a similarity 
metric involves reducing the complex information contained in shape descriptors into a single scalar value that indicates 
the degree of matching. The similarity metric that best suits a particular application depends on both the shape 
descriptor and the intended physical application. Some desirable properties of similarity metrics are listed below: 

• Metric Behavior: the ability for a similarity metric to give a value that is proportional to the physical 
match between the structures. Some similarity metrics satisfy the triangle inequality [TUS] (i-e., M{Sa,Sb) + 
M{Sa, Sc) > M{Sb, Sc), where M is a similarity metric, and Sa, Sb, Sc are shape descriptors) and are thus 
truly metrics, whereas others do not and can be considered pseudo-metrics. It is typically desirable for similarity 
metrics to range smoothly with the difference between structures. 

• Normalization: the range of possible matching values for a given matching scheme. For many condensed matter 
physics applications, we desire similarity metrics that range from to 1 for use as pseudo-order parameters. 
While many similarity metrics do not vary naturally from to 1, they can often be changed by simply shifting 
and scaling the interval that defines an ideal and worst-case match. In practice, there is little difference between 
this type of pseudo order parameter and a standard order parameter in terms of the underlying physics. 

• Specificity: the degree to which a similarity metric highlights specific differences between shape descriptors. 
For some applications it is desirable to give more weight to specific important differences between the descriptors. 

Often, similarity metrics are based on simple geometric functions, such as the Euclidean distance or vector projection 
between shape descriptors, which are typically represented as long vectors. Whereas similarity metrics based on 
the Euclidean distance are particularly common in the shape matching literature|64). schemes based on the vector 
projection are more commonly (implicitly) applied throughout the condensed matter literature[T21[T51[Tnj. In practice, 
the mathematical form of the similarity metric is typically of little consequence; virtually any function can be chosen, 
provided it ranges smoothly as the shapes become physically different. In some specific cases, specialized similarity 
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metrics are designed to be used in conjunction with particular shape descriptors. The shape histogram scheme 
described in section IIB| above utilizes a specialized quadratic form distance function for matching [83], which accounts 
for mismatches arising from near-misses that occur due to the discrete nature of the histogram bins. The P2 Legendre 
polynomial shown in Fig [T] is an implicit example of a specialized similarity metric, specifically designed to match the 
angles of rod-like particles with the ideal angle given by the global director (103j . 



III. EXAMPLE APPLICATIONS 



In this section, we demonstrate the application of shape matching techniques to a few representative problems from 
our studies of self-assembly. For simplicity, we use the same shape descriptor and similarity metric for all of the 
examples. Since our goal here is to demonstrate the basic usage of shape matching techniques, our examples should 
be considered proofs-of-concept rather than optimal solutions to the problems. Additional examples of applications 
of other shape descriptors to self-assembly may be found in References [5^ [TM] . 



A. Prototype Shape Matching Scheme 

For our example problems, we use the 3d Fourier shape descriptor [86] . which is the harmonic descriptor defined 
for patterns on the sphere, [9, (p]. We choose this descriptor because it is closely related to the spherical harmonics 
bond order parameters introduced by Nelson and coworkers |181 fT^ . and thus many readers will already be partially 
familiar with them. The basic idea behind the 3d Fourier descriptor is to decompose a 3d structure into one or 
more patterns on the 2d surface of a sphere, and represent these patterns mathematically by computing the discrete 
spherical hamonics transform (DSHT). This method of representing a pattern as its harmonic transform is analogous 
to the way that Id signals along the perimeter of the circle can be described by their discrete Fourier transform 
(DFT). 

How we extract the patterns on the sphere depends on how data is represented. For simplicity, we use a minimal 
data representation based solely on particle positions (i.e., point cloud data) for all of our examples; however other 
types of data, such as volumetric data, can also be easily treated by Fourier descriptors. For our examples, we describe 
particle structures as patterns on the sphere by (1) translating the structure to the origin, (2) grouping all positions 
within a radial shell and (3) converting each position x into its angular direction relative to the origin [0(x), (/'(x)]. 
This is repeated for all radial shells required to describe the full 3d structure, giving patterns on the sphere for 
each structure. 

For each pattern on the sphere, the Fourier coefficients of the DSHT are given by: 

n 

= - Vr;"*[^?(x,),0(x,)] m^^£,-e + i,..i. (i) 

n ^ — ' 

The term Y™ is a set of spherical harmonics with angular frequency £. The coefficients are vectors with 2^+1 
complex components. Although the Fourier coefficients in their complex number form are rotationally-dependent (i.e., 
their value depends on the spatial orientation of the underlying pattern) , we can convert them to their rotationally- 
invariant form by computing the magnitude of each coefficient. The invariant circular coefficients are given by: 



|qil 



\ 



The Fourier invariants are positive real numbers. Although the coefficient magnitudes themselves can be used directly 
as order parameters [19], incorporating them into a shape descriptor is often more powerful, since we can compare 
shapes based on a variety of frequencies and lengthscales. To create a descriptor from the Fourier coefficients, we 
simply combine the desired or |q^| into a long vector. For example, a general rotation- invariant shape descriptor 
that is applicable to patterns on the sphere over a range of symmetries is given by: 

^fheik =< |q^,_Uq£_+il,-|q^_J > • (3) 

The range of frequencies can be adjusted to obtain a desired level of resolution. For our examples below, we use 
^min = 4 and imax = 12. Siucc each Fourier descriptor describes the pattern for a given shell, we must combine the 
Fourier descriptors for each shell to describe the overall shape: 
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For our example applications, we will use a simple similarity metric based on the Euclidean distance |Si — Sj| 
between harmonic shape descriptors: 

M(S„ S,) = 1-2 (|S, - S,|/|S,| + |S,|) . (5) 

This similarity metric is proportional to the Euclidean distance between shape descriptor vectors, but is normalized 
such that vectors that match perfectly give a value of 1, while vectors that are perfectly anticorrelated give a value of 
— 1. Vectors with no directional correlation (i.e., that are orthogonal) give a value of 0. This normalization allows us 
to make a clearer analogy between our matching scheme with a typical order parameter; however, only the relative 
value of the similarity metric is relevant and the normalization is merely a matter of convenience. 



B. Example 1: Micellar Crystal Structures 
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FIG. 6: Identification of global crystalline structures for a system of ditethered spheres [56l 176] . (o) A crystal formed in the 
ditethered nanosphere system where the planar angle between tether attachment is 30 degrees. Ignoring chemical specificity of 
the tether micelles, the structure best matches the ideal diamond lattice. (6) A crystal formed by the ditethered nanosphere 
system with planar angle between tether attachment of 60 degrees. Ignoring chemical specificity of the micelles, the structure 
best matches an ideal simple cubic structure. In both cases, the micelle centers are extracted using a Gaussian filter, and 
matching is based on the global superposition of local patterns (section II A I. 



A straightforward application of shape matching techniques to particle systems is to identify unknown structures 
by searching a database of known reference structures. Structures are identified by the known structure that gives the 
best match. Structure identification can be performed for either local structures or for global samples. As a simple 
example of structure identification for a global sample, consider the ditethered nanosphere system of references j56[ 176) . 
which microphase separates into spherical micelles. The micelles themselves pack into an ordered binary crystalline 
superstructure. Depending on the state point, the system forms different crystals, as shown in Fig. [6] The structural 
pattern that represents the different crystals is obtained by identifying the micelle centers of mass, which comprise the 
set of positions that describe the system. The micelle centers of mass are determined by creating a density map (i.e., 
a voxel representation) for the aggregating polymer tethers and then applying a Gaussian filtering algorithm adapted 
from the colloidal science literature |48j |52j to identify the spheroid centers. Since the superstructure has long-range 
orientational ordering, a global pattern is given by the superposition of local patterns (see Fig. [s]) . The pattern for the 
unknown crystal is compared to those for several standard candidate crystals. For each pattern we compute the 3d 
Fourier descriptor S^"^ described above, with rotationally-invariant coefficients for a single shell, = 1. Using this 
method, the patterns are compared independently of spatial orientation over a single length scale used to construct the 
local clusters. The unknown crystal is identified by the reference structure that gives the best match. The structures 
in Fig. [6^,b are identified as diamond and simple cubic, respectively, where we do not consider the chemical specificity 
of the two types of micelles. Notice that the best match does not necessarily give a value that approaches 1; such 
deviations are common when comparing thermal systems to mathematically perfect reference structures, as we have 
done here. The micellar system under investigation exhibits thermal disorder as well as polydispersity in the shape 
and size of the micelles, and thus particle positions deviate from the ideal lattice points. Oftentimes, comparing to 
reference systems that exhibit similar levels of noise may provide clearer results. 
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This type of database search has aheady been appUed to particle systems in the context of proteins and 
macromoleculesflll [731 El IMl ll05H107j . Although database searches have only been applied in limited cases to 
assembled systems [S71 [7T], many standard local structure identification schemes in the condensed matter literature 
bear a strong resemblance to shape matching identification schemes. For example, the common neighbor analysis 
(CNA) scheme of reference [16] involves constructing numerical fingerprints for pairs of atoms based on their local 
neighbor configurations, and identifying local clusters by matching the distribution of fingerprints with those for 
ideal structures. In the language of shape matching, the collection of CNA fingerprints can be considered a shape 
descriptor, and the catalogue of ideal fingerprints can be considered a database of reference structures. A similar 
identification scheme is given by the bond order parameters of reference |19j . Here, particular local structures with 
strong symmetries, such as small ordered clusters of spherical particles, can be identified by finding structures with 
bond order parameters that exceed a particular threshold [IDS]. In this case, the bond order parameters represent 
shape descriptors, and the threshold values act implicitly as similarity metrics, since the ideal structures are known 
to have high values of the bond order parameters. 

C. Example 2: Icosahedral Clusters of Tetrahedral Particles 

As mentioned in the previous example, a common application of structural characterization schemes is to identify 
local motifs within a global system. Examples include finding locally stable clusters in liquids Jiij .22J, colloids and 
gels |109) and nanoparticle superstructures [571 [7T] . and identifying structural defects in, or grain boundaries between, 
crystalline domains, such as in dense colloidsjS]. Often, these local structural characteristics can be directly related 
to the thermodynamic, mechanical, or other properties of the system. 

When detecting local structures in systems without long-range orientational order (i.e. "disordered" systems), we 
often encounter structures that are not present in our reference library. A structure that does not match with those 
in the reference library within a certain threshold is considered "disordered," or unimportant j67L 171 j . The threshold 
must be chosen carefully; in thermal systems, an overly-stringent cutoff value might cause a matching scheme to miss 
highly-ordered structures perturbed slightly from their ideal configurations, whereas an overly-permissive cutoff can 
misidentify highly disordered structures. In most cases, a sufficiently rigorous cutoff can be defined such that its value 
does not affect the qualitative results. 




FIG. 7: Icosahedral clusters in the hard tetrahedron system[22]. As the pressure and the corresponding density increase, 
icosahedra grow more prevalent until the system transforms into a dodecagonal quasicrystal at P ~ 62, at which point the 
number of icosahedra vanishes. 

As an example of identifying ordered local structures in an otherwise disordered system, consider the hard tetrahe- 
dron fluid studied in reference [55] (Fig. [7^). In this system, an important local motif to both the fluid and the glass, 
originally identifled by visual inspection, is the icosahedron formed by 20 tetrahedra sharing a common vertex. To 
identify icosahedra in the system, we first cluster all sets of 20 tetrahedra in the system that share a common vertex. 
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The structural pattern for each cluster is defined by the directions of vectors drawn from the center of the cluster 
through the face of each of the 20 tetrahedra, which for an ideal icosahedral cluster results in a dodecahedron. Any 
local cluster i that matches the shape of a dodecahedron with a value of Mcuti^f^, ^dodecahedron) > 0.9 is considered 
to be in an icosahedral motif. Fig. [7] shows the fraction of tetrahedra that participate in at least one icosahedron 
as a function of pressure. Icosahedra are relatively common in the tetrahedral fluid (below P — 62) and become 
more prevalent with increasing density, persisting into the glass if the fluid is compressed too quickly. As the fluid 
transforms into a quasicrystal at P w 62, the fraction of tetrahedra in icosahedra decreases drastically, and vanishes 
for the ideal quasicrystal without thermal fluctuations. Although the value of Mcut may affect the absolute number 
of icosahedra, the same underlying physical transition is captured for any reasonable value. 



D. Example 3: Assembly of a Helical Ribbon 



Another standard application of structural metrics is to track structural transitions, either as a function of time or 
a changing reaction coordinate. This is typically accomplished by monitoring either an order parameter or correlation 
function as the system goes through a transition. Tracking structural transitions is important for a wide variety of 
applications, including elucidating thermodynamic transitions [T^ [TTUl4113j and assembly pathways [251 [571 [1141 1115) . 



Many of the advanced molecular simulation techniques used to study transitions [TT6HT20] rely on structural metrics 
in the context of pseudo-reaction coordinates [llTj, biasing parameters |116| . and collective variables (T^Di to guide the 
statistical sampling algorithm. Standard order parameters have been devised for various types of ordering, including 
bond orientational ordering |19l I121H123| . liquid crystalline ordering |13l I103 j such as nematicfl24' and smectic flllj 
phases, chiral ordering |17|. and helical ordering [125j. Time correlation functions based on these types of order pa- 
rameters have been applied to creating structural "memory" functions for glassy liquids P^261 1127] and ordered motifs 
attaching to a growing quasicrystal nucleus fl28) . 
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FIG. 8: Assembly of a helical sheet composed of laterally tethered nanorods[H2]. The rods form a bilayer with long attractive 
tethers on one side, and shorter attractive tethers on the other. As time progresses, the sheet folds into a helix to maximize 
the favorable energetic interactions between the longer tethers. The matching order parameter MdistiSt/Tj^heiix) compares 
the structure at time t with the shape of the final ideal helical structure. 



As a simple example of using shape descriptors to create an order parameter, consider the ribbon-like bilayer 
composed of laterally tethered nanorods studied in reference [52], and shown in Fig. [sj The initial sheet or ribbon is 
unstable and eventually relaxes into a stable helical structure. We can track this structural transition by matching 
the shape of the sheet at a given time t with the final, fully-equilibrated helical structure: M(Sf ^, 8^^;^^). Since 
the structure is 3-dimensional and has radial dependence, we use a Fourier descriptor with = 6 radial shells: 
Ts = lOcr, 30a...ll0cr, where a is the distance unit corresponding to a Lennard- Jones particle diameter. Since the 
sheet only changes in terms of its twist in space, we save computational effort by only considering points along the 
backbone of the sheet. Fig. [8^ shows the helical order parameter as a function of time for a long molecular dynamics 
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run. We observe that the sheet begins to twist from both ends simultaneously, which gives rise to a defect at the 
center of the helix, where a mismatch in the periodicity between the two ends occurs. This results in a tendency 
for the structure to bend to close the defect. The bend persists for many millions of time steps before annealing 
into a defect free helix at around t/r — 4.5 x 10^. This behavior is well captured by matching the overall shape of 
the structure, but is not captured by the more standard descriptor, applied in the original reference, which only 
measures the degree of helical ordering and gives an essentially constant value for all times after the completion of 
twisting at </r « 7 X 10^ Using H/^ alone, it would appear that the structure is fully formed at this early time, 
which does not capture the important defect removal behavior, which can also be observed by visual inspection. 

IV. FUTURE OUTLOOK 

Beyond identifying local and global structures and tracking structural transitions, there are many more applications 
of shape matching. In this section, we briefly review some areas in which we are currently applying shape matching 
for studying self assembly. Additional details may be found in Ref. |104j and in the individual references cited below. 

On-The-Fly Structure Identification: For many assembly applications, such as Bottom-Up-Building-Block- 
Assembly (BUBBA) [129] , we are interested in cataloguing unique structures. When enumerating unique structures 
it is not typically necessary (or feasible) to define a library of reference structures a ■priori, as we did for examples 1 
and 2 above. Rather, the reference library can be compiled on-the-fly as new structures are encountered (see Fig.|9^). 
Each new structure is given a unique identifier, and structures that are duplicates are labeled with the same identifier. 
In addition to cluster enumeration schemes, this type of algorithm can potentially be applied to automatically detect 
regions of unique ordering in structural phase diagrams. 

Space/Time Correlation Functions: In example 3 above, we demonstrated how shape matching could be used to 
track a structural transition as a function of time, or a reaction coordinate. Another common application of structural 
metrics is to characterize how structures change in space. In the context of shape matching, this involves choosing 
structures from different points in the system, rather than ideal structures, as reference structures. Spatial correlation 
functions are often used to measure structural "correlation lengths." In the condensed matter literature, structural 
correlation functions have been defined for crystal-like ordering in 2d [12H [122j and 3d^^|19j, nematic ordering [13lj, 
and many other more specialized types of ordering. More specialized types of spatial correlation functions have been 
widely applied as well. One example is the ■ 96 scheme of references |15l 1132) . which detects ordered crystal nuclei 
based on spatial correlations between local bond order parameters. This scheme can be adapted to identify crystal 
nuclei in general by replacing gg, which is only sensitive to particular crystal structures, with other shape descriptors 
that are applicable to a particular crystal under investigation. Fig. [9]d depicts the formation of a diamond-structured 
crystal nucleus (yellow) in a system of patchy particles, identified by replacing gg with the t — i Fourier coefficient. 

Structure Grouping and Classification: The field of self-assembly involves a wealth of particle building blocks 
and the assemblies they form; thus it is sometimes useful to categorize or classify structures based on particular 
structural features. For example, reference |3j ranks different building blocks for self-assembly based on their shape 
anisotropy. Shape matching methods can provide numerical metrics by which to classify structures. Structures can 
be ranked based on the degree to which they exhibit a particular structural feature of interest, or by how well they 
match ideal structures exhibiting a particular feature. For example, structures can be ranked based on their 6-fold 
symmetry by computing the value of their ^ = 6 Fourier descriptor, which is proportional to the degree of 6-fold 
symmetry. Similarly, we can create groups of structures that exhibit a particular structural feature by comparing 
shape descriptors. One example of a technique used to visually group similar structures is given by plotting a matrix 
of pairwise similarity values known as a "similarity matrix" or "heat map" |133) , as depicted in Fig. [9]: for 2d colloidal 
clusters [134]. Groups of clusters with similar structural features produce bright blocks, indicating that clusters within 
this region of parameter space match well. Grouping objects based on shape similarity has also been applied recently 
to macromolecules and m'oteins fT^ 1106] . 

Abstract Correlation JYinctions: Thus far, we have either extended the applicability of standard condensed matter 
order parameters and correlation functions by incorporating shape matching, or applied standard shape matching 
applications directly in the context of assembled systems. However, in addition to extending existing applications for 
use with assembled systems, shape matching allows us to invent new methods that have not yet been explored. For 
example, rather than creating correlation functions in space and time as we typically do for condensed matter systems, 
we can create abstract correlation functions in parameter space. Fig.[9}i depicts a parameter space correlation function 
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FIG. 9: Potential uses for shape matching in assembly applications, (a) Searching parameter spaces for unique structures. The 
panel depicts the Bottom-Up-Building-Block- Assembly (BUBBA) algorithm [129]. (b) Computing spatial correlation functions. 
The panel depicts detecting a growing diamond crystal nucleus in a system of patchy particles [37]. (c) Structure grouping and 
classification. The panel depicts a similarity matrix (i.e., all of the pairwise similarity values) for 2d clusters of different sizes. 
Groups of similar structures are identified by bright boxes about the line y=x. (d) Abstract correlation functions. The panel 
depicts a structural phase diagram for the 2d Lennard- Jones Gauss system (top), created by visual inspection [130], compared 
with a phase diagram for the same system generated automatically using a shape matching algorithm (bottom). 



computed for the 2d Lennard- Jones Gauss svstem ["130| . which identifies structural phase boundaries (purple) by finding 
points in parameter space that do not match well with their neighboring points. This correlation function is able 
to reproduce the structural phase diagram produced in reference [130] by visual inspection of over 5000 independent 
configurations. This scheme is just one example of how shape matching algorithms can replace the human element in 
searching for target structures, and rapidly mapping parameter spaces. The ability to expedite self-assembly research 
by automating the study of unique systems may represent one of the most important uses for shape matching moving 
forward. 

Summary: The example applications and shape descriptors that we have provided here represent only a small subset 
of the vast range of possibilities yet to be explored. In the future, the wealth of shape descriptors from the shape 
matching literature should be tested for different classes of particle systems to expand the scope of order parameters 
available to the fields of experimental and computational assembly. New abstract order parameters and correlation 
functions, such as the phase space correlation function of Fig. [9ji, can be constructed to expand the algorithms used 
to explore new systems. More immediately, the relatively simple algorithms outlined here can be applied to existing 
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assembled systems to enhance our ability to gain insight into the underlying physics of these complex systems. 
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